Stock Market Analysis with technical indicators and preditions¶

In [1]:
# Import Modules
import os
import sys

import json
from pathlib import Path
import numpy as np
import pandas as pd
import os
import random
import copy
import matplotlib.pyplot as plt
import pandas
In [4]:
import yfinance as yf
from pandas_datareader import data as pdr
from datetime import datetime
yf.pdr_override()

Setup start and end time to read the data¶

In [5]:
end = datetime.now()
start = datetime(end.year - 10, end.month, end.day)
print(start,' ', end)
stock_list = ['AMAT','LRCX','WOLF','KLAC','AAPL', 'GOOG', 'MSFT', 'AMZN']
2013-03-31 00:00:00   2023-03-31 17:57:28.045856

Pull data from yfinance and store Data in a dataframe. Add Label column, configure Data column.¶

In [4]:
data = []
for stock in stock_list:
    df = yf.download(stock, start, end)
    df = df.reset_index()
    df['Label'] = stock
    df['Date'] = pd.to_datetime(df['Date'])
    data.append(df)
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

Examine Data¶

In [5]:
data[0].head()
Out[5]:
Date Open High Low Close Adj Close Volume Label
0 2013-04-01 13.48 13.49 13.27 13.36 11.490361 11026400 AMAT
1 2013-04-02 13.46 13.46 13.19 13.24 11.387155 8535600 AMAT
2 2013-04-03 13.24 13.24 13.02 13.15 11.309752 14428100 AMAT
3 2013-04-04 13.10 13.24 13.03 13.22 11.369953 7399800 AMAT
4 2013-04-05 13.02 13.22 12.91 13.20 11.352753 10349000 AMAT
In [6]:
data[1].head()
Out[6]:
Date Open High Low Close Adj Close Volume Label
0 2013-04-01 41.340000 41.419998 40.700001 40.799999 35.655560 1602800 LRCX
1 2013-04-02 40.910000 41.200001 40.490002 40.689999 35.559414 1992400 LRCX
2 2013-04-03 40.849998 41.090000 40.189999 40.470001 35.367153 2716500 LRCX
3 2013-04-04 40.439999 41.299999 40.270000 41.230000 36.031338 1898600 LRCX
4 2013-04-05 40.610001 40.959999 40.060001 40.770000 35.629326 1645200 LRCX
In [7]:
print(len(data))
8

Add Technical Indicators to the dataframe¶

Calculate these technical indicators

  • RSI
  • Volume (plain)
  • Bollinger Bands
  • Aroon Oscillator
  • Price Volume Trend
  • acceleration bands
In [ ]:
TechIndicator = copy.deepcopy(data)

Calculation of Relative Strength Index (RSI)¶

Relative Strength Index¶

PriceUp = Price > 0¶

PriceDown = Price < 0¶

Avg(PriceUp)/(Avg(PriceUP)+Avg(PriceDown)*100¶

Where: PriceUp(t)=1*(Price(t)-Price(t-1)){Price(t)- Price(t-1)>0};¶

PriceDown(t)=-1*(Price(t)-Price(t-1)){Price(t)- Price(t-1)<0};¶

In [ ]:
def rsi(values):
    up = values[values>0].mean()
    down = -1*values[values<0].mean()
    return 100 * up / (up + down)

Add Momentum_1D column for all stocks.¶

Momentum_1D = P(t) - P(t-1)¶

In [ ]:
for stock in range(len(TechIndicator)):
    TechIndicator[stock]['Momentum_1D'] = (TechIndicator[stock]['Close']-TechIndicator[stock]['Close'].shift(1)).fillna(0)
    TechIndicator[stock]['RSI_14D'] = TechIndicator[stock]['Momentum_1D'].rolling(center=False, window=14).apply(rsi).fillna(0)
TechIndicator[1].tail(5)
Out[ ]:
Date Open High Low Close Adj Close Volume Label Momentum_1D RSI_14D
2515 2023-03-27 508.260010 509.799988 494.679993 495.769989 495.769989 1107100 LRCX -6.290009 69.195968
2516 2023-03-28 495.399994 496.549988 478.769989 485.079987 485.079987 1463600 LRCX -10.690002 68.993074
2517 2023-03-29 494.540009 521.159973 492.260010 515.750000 515.750000 2072200 LRCX 30.670013 70.862550
2518 2023-03-30 523.609985 532.760010 521.960022 531.359985 531.359985 2055300 LRCX 15.609985 71.266370
2519 2023-03-31 527.640015 532.715576 525.390015 528.700012 528.700012 223027 LRCX -2.659973 70.985876
In [ ]:
TechIndicator[1].head(5)
Out[ ]:
Date Open High Low Close Adj Close Volume Label Momentum_1D RSI_14D
0 2013-04-01 41.340000 41.419998 40.700001 40.799999 35.655575 1602800 LRCX 0.000000 0.0
1 2013-04-02 40.910000 41.200001 40.490002 40.689999 35.559422 1992400 LRCX -0.110001 0.0
2 2013-04-03 40.849998 41.090000 40.189999 40.470001 35.367176 2716500 LRCX -0.219997 0.0
3 2013-04-04 40.439999 41.299999 40.270000 41.230000 36.031334 1898600 LRCX 0.759998 0.0
4 2013-04-05 40.610001 40.959999 40.060001 40.770000 35.629337 1645200 LRCX -0.459999 0.0
In [ ]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly
import plotly.express as px
import numpy as np
In [ ]:
#Cleanup the 'Volume' column.
for stock in range(len(TechIndicator)):
    TechIndicator[stock]['Volume_plain'] = TechIndicator[stock]['Volume'].fillna(0)
TechIndicator[0].tail()
Out[ ]:
Date Open High Low Close Adj Close Volume Label Momentum_1D RSI_14D Volume_plain
2515 2023-03-27 120.750000 120.980003 118.320000 118.870003 118.870003 4628200 AMAT -0.659996 60.772359 4628200
2516 2023-03-28 118.889999 119.059998 115.580002 116.400002 116.400002 5523800 AMAT -2.470001 61.147988 5523800
2517 2023-03-29 118.720001 121.010002 117.360001 119.849998 119.849998 7865300 AMAT 3.449997 60.986116 7865300
2518 2023-03-30 122.000000 123.379997 121.339996 122.110001 122.110001 6213100 AMAT 2.260002 61.022278 6213100
2519 2023-03-31 121.519997 122.519997 121.000000 122.190002 122.190002 626666 AMAT 0.080002 59.435375 626666

Calculation of Bollinger Bands¶

Calculate average closing value for given period(i.e. 30 days)¶

Calculate standard deviation of closing value for the given period(i.e. 30 days)¶

Calculate the upper band¶

  • upperband = average + standard deviation * number of standard deviations(i.e. 3)

Calculate the lower band¶

  • lowerband = average - standard deviation * number of standard deviations(i.e. 3)
In [ ]:
def bbands(price, length=30, numsd=3):    
    ave = price.rolling(window = length, center = False).mean()    
    sd = price.rolling(window = length, center = False).std()
    upband = ave + (sd*numsd)
    dnband = ave - (sd*numsd)
    return np.round(ave,3), np.round(upband,3), np.round(dnband,3)
In [ ]:
for stock in range(len(TechIndicator)):
    TechIndicator[stock]['BB_Middle_Band'], TechIndicator[stock]['BB_Upper_Band'], TechIndicator[stock]['BB_Lower_Band'] = bbands(TechIndicator[stock]['Close'], length=30, numsd=3)
    TechIndicator[stock]['BB_Middle_Band'] = TechIndicator[stock]['BB_Middle_Band'].fillna(0)
    TechIndicator[stock]['BB_Upper_Band'] = TechIndicator[stock]['BB_Upper_Band'].fillna(0)
    TechIndicator[stock]['BB_Lower_Band'] = TechIndicator[stock]['BB_Lower_Band'].fillna(0)
TechIndicator[0].tail()
Out[ ]:
Date Open High Low Close Adj Close Volume Label Momentum_1D RSI_14D Volume_plain BB_Middle_Band BB_Upper_Band BB_Lower_Band
2515 2023-03-27 120.750000 120.980003 118.320000 118.870003 118.870003 4628200 AMAT -0.659996 60.772359 4628200 117.454 128.078 106.830
2516 2023-03-28 118.889999 119.059998 115.580002 116.400002 116.400002 5523800 AMAT -2.470001 61.147988 5523800 117.450 128.078 106.822
2517 2023-03-29 118.720001 121.010002 117.360001 119.849998 119.849998 7865300 AMAT 3.449997 60.986116 7865300 117.497 128.193 106.800
2518 2023-03-30 122.000000 123.379997 121.339996 122.110001 122.110001 6213100 AMAT 2.260002 61.022278 6213100 117.585 128.528 106.642
2519 2023-03-31 121.519997 122.519997 121.000000 122.190002 122.190002 626666 AMAT 0.080002 59.435375 626666 117.812 128.963 106.660

Calculation of Aroon Oscillator¶

Aroon-Up = (25 – Number of Days Since Recent 25-day High / 25)* 100¶

Aroon-Down = (25 – Number of Days Since Recent 25-day Low / 25) * 100¶

In [ ]:
def Aroon_Oscillator(df, tf=25):
    aroonup = []
    aroondown = []
    x = tf
    while x< len(df['Date']):
        aroon_up = ((df['High'][x-tf:x].tolist().index(max(df['High'][x-tf:x])))/float(tf))*100
        aroon_down = ((df['Low'][x-tf:x].tolist().index(min(df['Low'][x-tf:x])))/float(tf))*100
        aroonup.append(aroon_up)
        aroondown.append(aroon_down)
        x+=1
    return aroonup, aroondown
In [ ]:
for stock in range(len(TechIndicator)):
    listofzeros = [0] * 25
    up, down = Aroon_Oscillator(TechIndicator[stock])
    aroon_list = [x - y for x, y in zip(up,down)]
    if len(aroon_list)==0:
        aroon_list = [0] * TechIndicator[stock].shape[0]
        TechIndicator[stock]['Aroon_Oscillator'] = aroon_list
    else:
        TechIndicator[stock]['Aroon_Oscillator'] = listofzeros+aroon_list
In [ ]:
TechIndicator[0].head()
Out[ ]:
Date Open High Low Close Adj Close Volume Label Momentum_1D RSI_14D Volume_plain BB_Middle_Band BB_Upper_Band BB_Lower_Band Aroon_Oscillator
0 2013-04-01 13.48 13.49 13.27 13.36 11.490359 11026400 AMAT 0.000000 0.0 11026400 0.0 0.0 0.0 0.0
1 2013-04-02 13.46 13.46 13.19 13.24 11.387154 8535600 AMAT -0.120000 0.0 8535600 0.0 0.0 0.0 0.0
2 2013-04-03 13.24 13.24 13.02 13.15 11.309745 14428100 AMAT -0.090000 0.0 14428100 0.0 0.0 0.0 0.0
3 2013-04-04 13.10 13.24 13.03 13.22 11.369956 7399800 AMAT 0.070001 0.0 7399800 0.0 0.0 0.0 0.0
4 2013-04-05 13.02 13.22 12.91 13.20 11.352756 10349000 AMAT -0.020000 0.0 10349000 0.0 0.0 0.0 0.0
In [ ]:
TechIndicator[0].tail()
Out[ ]:
Date Open High Low Close Adj Close Volume Label Momentum_1D RSI_14D Volume_plain BB_Middle_Band BB_Upper_Band BB_Lower_Band Aroon_Oscillator
2515 2023-03-27 120.750000 120.980003 118.320000 118.870003 118.870003 4628200 AMAT -0.659996 60.772359 4628200 117.454 128.078 106.830 76.0
2516 2023-03-28 118.889999 119.059998 115.580002 116.400002 116.400002 5523800 AMAT -2.470001 61.147988 5523800 117.450 128.078 106.822 76.0
2517 2023-03-29 118.720001 121.010002 117.360001 119.849998 119.849998 7865300 AMAT 3.449997 60.986116 7865300 117.497 128.193 106.800 76.0
2518 2023-03-30 122.000000 123.379997 121.339996 122.110001 122.110001 6213100 AMAT 2.260002 61.022278 6213100 117.585 128.528 106.642 72.0
2519 2023-03-31 121.519997 122.519997 121.000000 122.190002 122.190002 626666 AMAT 0.080002 59.435375 626666 117.812 128.963 106.660 68.0

Calculation of Price Volume Trend¶

PVT = [((CurrentClose - PreviousClose) / PreviousClose) x Volume]¶

PVT = PVT+ PreviousPVT¶

In [ ]:
for stock in range(len(TechIndicator)):
    TechIndicator[stock]["PVT"] = (TechIndicator[stock]['Momentum_1D']/ TechIndicator[stock]['Close'].shift(1))*TechIndicator[stock]['Volume_plain']
    TechIndicator[stock]["PVT"] = TechIndicator[stock]["PVT"]+TechIndicator[stock]["PVT"].shift(1)
    TechIndicator[stock]["PVT"] = TechIndicator[stock]["PVT"].fillna(0)
TechIndicator[0].tail()
Out[ ]:
Date Open High Low Close Adj Close Volume Label Momentum_1D RSI_14D Volume_plain BB_Middle_Band BB_Upper_Band BB_Lower_Band Aroon_Oscillator PVT
2515 2023-03-27 120.750000 120.980003 118.320000 118.870003 118.870003 4628200 AMAT -0.659996 60.772359 4628200 117.454 128.078 106.830 76.0 -193759.348005
2516 2023-03-28 118.889999 119.059998 115.580002 116.400002 116.400002 5523800 AMAT -2.470001 61.147988 5523800 117.450 128.078 106.822 76.0 -140334.144633
2517 2023-03-29 118.720001 121.010002 117.360001 119.849998 119.849998 7865300 AMAT 3.449997 60.986116 7865300 117.497 128.193 106.800 76.0 118341.689068
2518 2023-03-30 122.000000 123.379997 121.339996 122.110001 122.110001 6213100 AMAT 2.260002 61.022278 6213100 117.585 128.528 106.642 72.0 350280.741247
2519 2023-03-31 121.519997 122.519997 121.000000 122.190002 122.190002 626666 AMAT 0.080002 59.435375 626666 117.812 128.963 106.660 68.0 117570.513104

Calculation of Acceleration Bands¶

lower envelope bands around a simple moving average.

In [ ]:
def abands(df):
    df['AB_Middle_Band'] = df['Close'].rolling(window = 20, center=False).mean()
    df['aupband'] = df['High'] * (1 + 4 * (df['High']-df['Low'])/(df['High']+df['Low']))
    df['AB_Upper_Band'] = df['aupband'].rolling(window=20, center=False).mean()
    df['adownband'] = df['Low'] * (1 - 4 * (df['High']-df['Low'])/(df['High']+df['Low']))
    df['AB_Lower_Band'] = df['adownband'].rolling(window=20, center=False).mean()
In [ ]:
for stock in range(len(TechIndicator)):
    abands(TechIndicator[stock])
    TechIndicator[stock] = TechIndicator[stock].fillna(0)
TechIndicator[0].tail()
Out[ ]:
Date Open High Low Close Adj Close Volume Label Momentum_1D RSI_14D ... BB_Middle_Band BB_Upper_Band BB_Lower_Band Aroon_Oscillator PVT AB_Middle_Band aupband AB_Upper_Band adownband AB_Lower_Band
2515 2023-03-27 120.750000 120.980003 118.320000 118.870003 118.870003 4628200 AMAT -0.659996 60.772359 ... 117.454 128.078 106.830 76.0 -193759.348005 119.0190 126.359147 129.031830 113.059128 109.554331
2516 2023-03-28 118.889999 119.059998 115.580002 116.400002 116.400002 5523800 AMAT -2.470001 61.147988 ... 117.450 128.078 106.822 76.0 -140334.144633 119.0315 126.123214 129.108039 108.723235 109.565542
2517 2023-03-29 118.720001 121.010002 117.360001 119.849998 119.849998 7865300 AMAT 3.449997 60.986116 ... 117.497 128.193 106.800 76.0 118341.689068 119.1610 128.421785 129.322731 110.171778 109.462733
2518 2023-03-30 122.000000 123.379997 121.339996 122.110001 122.110001 6213100 AMAT 2.260002 61.022278 ... 117.585 128.528 106.642 72.0 350280.741247 119.3440 127.494010 129.215506 117.294006 110.135506
2519 2023-03-31 121.519997 122.519997 121.000000 122.190002 122.190002 626666 AMAT 0.080002 59.435375 ... 117.812 128.963 106.660 68.0 117570.513104 119.5055 125.578965 129.316423 117.978982 110.403924

5 rows × 21 columns

Cleanup the tables¶

In [ ]:
columns2Drop = ['Momentum_1D', 'aupband', 'adownband']
for stock in range(len(TechIndicator)):
    TechIndicator[stock] = TechIndicator[stock].drop(labels = columns2Drop, axis=1)
TechIndicator[0].head()
Out[ ]:
Date Open High Low Close Adj Close Volume Label RSI_14D Volume_plain BB_Middle_Band BB_Upper_Band BB_Lower_Band Aroon_Oscillator PVT AB_Middle_Band AB_Upper_Band AB_Lower_Band
0 2013-04-01 13.48 13.49 13.27 13.36 11.490359 11026400 AMAT 0.0 11026400 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0
1 2013-04-02 13.46 13.46 13.19 13.24 11.387154 8535600 AMAT 0.0 8535600 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0
2 2013-04-03 13.24 13.24 13.02 13.15 11.309745 14428100 AMAT 0.0 14428100 0.0 0.0 0.0 0.0 -174743.371158 0.0 0.0 0.0
3 2013-04-04 13.10 13.24 13.03 13.22 11.369956 7399800 AMAT 0.0 7399800 0.0 0.0 0.0 0.0 -58685.440026 0.0 0.0 0.0
4 2013-04-05 13.02 13.22 12.91 13.20 11.352756 10349000 AMAT 0.0 10349000 0.0 0.0 0.0 0.0 23733.997437 0.0 0.0 0.0

Visualization of technical indicators¶

In [ ]:
# Index the Data column for visulization
for stock in range(len(TechIndicator)):
    TechIndicator[stock].index = TechIndicator[stock]['Date']
    TechIndicator[stock] = TechIndicator[stock].drop(labels = ['Date'], axis = 1)
In [ ]:
TechIndicator[0].head()
Out[ ]:
Open High Low Close Adj Close Volume Label RSI_14D Volume_plain BB_Middle_Band BB_Upper_Band BB_Lower_Band Aroon_Oscillator PVT AB_Middle_Band AB_Upper_Band AB_Lower_Band
Date
2013-04-01 13.48 13.49 13.27 13.36 11.490359 11026400 AMAT 0.0 11026400 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0
2013-04-02 13.46 13.46 13.19 13.24 11.387154 8535600 AMAT 0.0 8535600 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0
2013-04-03 13.24 13.24 13.02 13.15 11.309745 14428100 AMAT 0.0 14428100 0.0 0.0 0.0 0.0 -174743.371158 0.0 0.0 0.0
2013-04-04 13.10 13.24 13.03 13.22 11.369956 7399800 AMAT 0.0 7399800 0.0 0.0 0.0 0.0 -58685.440026 0.0 0.0 0.0
2013-04-05 13.02 13.22 12.91 13.20 11.352756 10349000 AMAT 0.0 10349000 0.0 0.0 0.0 0.0 23733.997437 0.0 0.0 0.0

Plot Relative Strength Index(RSI)¶

In [ ]:
%matplotlib inline
fig = plt.figure(figsize=(20,25))
for i in range(len(TechIndicator)):
    ax = plt.subplot(4,2,i+1)
    ax.plot(TechIndicator[i].index, TechIndicator[i]['RSI_14D'])
    ax.set_title(str(TechIndicator[i]['Label'][0]))
    ax.set_xlabel("Date")
    ax.set_ylabel("Relative Strength Index")
    plt.xticks(rotation=30)
fig.tight_layout()

Plot Plain Volume¶

In [ ]:
fig = plt.figure(figsize=(20,25))
for i in range(len(TechIndicator)):
    ax = plt.subplot(len(TechIndicator),1,i+1)
    ax.plot(TechIndicator[i].index, TechIndicator[i]['Volume_plain'], 'b')
    ax.set_title(str(TechIndicator[i]['Label'][0]))
    ax.set_xlabel("Date")
    ax.set_ylabel("Volume")
    plt.xticks(rotation=30)
fig.tight_layout()

Plot Bollinger Bands¶

In [ ]:
plt.style.use('fivethirtyeight')

fig = plt.figure(figsize=(20,25))
for i in range(len(TechIndicator)):
    ax = plt.subplot(4,2,i+1)
    ax.fill_between(TechIndicator[i].index, TechIndicator[i]['BB_Upper_Band'], TechIndicator[i]['BB_Lower_Band'], color='grey', label="Band Range")
    # Plot Adjust Closing Price and Moving Averages
    ax.plot(TechIndicator[i].index, TechIndicator[i]['Close'], color='red', lw=2, label="Close")
    ax.plot(TechIndicator[i].index, TechIndicator[i]['BB_Middle_Band'], color='black', lw=2, label="Middle Band")
    ax.set_title("Bollinger Bands for " + str(TechIndicator[i]['Label'][0]))
    ax.legend()
    ax.set_xlabel("Date")
    ax.set_ylabel("Close Prices")
    plt.xticks(rotation=30)
fig.tight_layout()

Plot Aroon Oscillators¶

In [ ]:
plt.style.use('seaborn-whitegrid')
fig = plt.figure(figsize=(20,25))
for i in range(len(TechIndicator)):
    ax = plt.subplot(4,2,i+1)
    ax.fill(TechIndicator[i].index, TechIndicator[i]['Aroon_Oscillator'],'b', alpha = 0.5, label = "Aroon Oscillator")
    ax.plot(TechIndicator[i].index, TechIndicator[i]['Close'], 'r', label="Close")
    ax.set_title("Aroon Oscillator for " +str(TechIndicator[i]['Label'][0]))
    ax.legend()
    ax.set_xlabel("Date")
    ax.set_ylabel("Close Prices")
    plt.xticks(rotation=30)
fig.tight_layout()
<ipython-input-31-4628a4232332>:1: MatplotlibDeprecationWarning: The seaborn styles shipped by Matplotlib are deprecated since 3.6, as they no longer correspond to the styles shipped by seaborn. However, they will remain available as 'seaborn-v0_8-<style>'. Alternatively, directly use the seaborn API instead.
  plt.style.use('seaborn-whitegrid')

Plot Price Volume Trend¶

In [ ]:
fig = plt.figure(figsize=(20,25))
for i in range(len(TechIndicator)):
    ax = plt.subplot(8,1,i+1)
    ax.plot(TechIndicator[i].index, TechIndicator[i]['PVT'], 'black')
    ax.set_title("Price Volume Trend of " +str(TechIndicator[i]['Label'][0]))
    ax.set_xlabel("Date")
    ax.set_ylabel("Price Volume trend")
    plt.xticks(rotation=30)
fig.tight_layout()

Plot Acceleration bands¶

In [ ]:
fig = plt.figure(figsize=(20,25))
for i in range(len(TechIndicator)):
    ax = plt.subplot(4,2,i+1)
    ax.fill_between(TechIndicator[i].index, TechIndicator[i]['AB_Upper_Band'], TechIndicator[i]['AB_Lower_Band'], color='grey', label = "Band-Range")
    # Plot Adjust Closing Price and Moving Averages
    ax.plot(TechIndicator[i].index, TechIndicator[i]['Close'], color='red', lw=2, label = "Close")
    ax.plot(TechIndicator[i].index, TechIndicator[i]['AB_Middle_Band'], color='black', lw=2, label="Middle_Band")
    ax.set_title("Acceleration Bands for " + str(TechIndicator[i]['Label'][0]))
    ax.legend()
    ax.set_xlabel("Date")
    ax.set_ylabel("Close Prices")
    plt.xticks(rotation=30)
fig.tight_layout()

Making Predictions Using the Daily Close price¶

Let's predict AMAT stock¶

In [6]:
# Get the stock quote
stock = 'AMAT'
df = pdr.get_data_yahoo(stock, start=start, end=end)
# Show teh data
df.tail()
[*********************100%***********************]  1 of 1 completed
Out[6]:
Open High Low Close Adj Close Volume
Date
2023-03-27 120.750000 120.980003 118.320000 118.870003 118.870003 4628200
2023-03-28 118.889999 119.059998 115.580002 116.400002 116.400002 5523800
2023-03-29 118.720001 121.010002 117.360001 119.849998 119.849998 7865300
2023-03-30 122.000000 123.379997 121.339996 122.110001 122.110001 6213100
2023-03-31 121.519997 123.519997 121.000000 121.970001 121.970001 1747142
In [7]:
plt.figure(figsize=(16,6))
plt.title('Close Price History for ' + stock)
plt.plot(df['Close'])
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.xticks(rotation=30)
plt.show()
In [8]:
# Create a new dataframe with only the 'Close column 
data = df.filter(['Close'])
# Convert the dataframe to a numpy array
dataset = data.values
# Get the number of rows to train the model on
training_data_len = int(np.ceil( len(dataset) * .80 ))

training_data_len
Out[8]:
2016
In [9]:
# Scale the data
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)

scaled_data
Out[9]:
array([[0.00285566],
       [0.00207684],
       [0.00149273],
       ...,
       [0.69399012],
       [0.70865784],
       [0.70774923]])
In [10]:
# Create the training data set 
# Create the scaled training data set
train_data = scaled_data[0:int(training_data_len), :]
# Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in range(60, len(train_data)):
    x_train.append(train_data[i-60:i, 0])
    y_train.append(train_data[i, 0])
print(len(train_data))    
print(len(x_train))
print(len(y_train))
        
# Convert the x_train and y_train to numpy arrays 
x_train, y_train = np.array(x_train), np.array(y_train)

# Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
# x_train.shape
2016
1956
1956
In [11]:
from keras.models import Sequential
from keras.layers import Dense, LSTM

# Build the LSTM model
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(64, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error',metrics=['accuracy'])

# Train the model
history = model.fit(x_train, y_train, batch_size=1, epochs=5)
Epoch 1/5
1956/1956 [==============================] - 25s 8ms/step - loss: 5.6376e-04 - accuracy: 0.0000e+00
Epoch 2/5
1956/1956 [==============================] - 15s 8ms/step - loss: 3.1521e-04 - accuracy: 0.0000e+00
Epoch 3/5
1956/1956 [==============================] - 15s 8ms/step - loss: 2.8879e-04 - accuracy: 0.0000e+00
Epoch 4/5
1956/1956 [==============================] - 16s 8ms/step - loss: 2.0220e-04 - accuracy: 0.0000e+00
Epoch 5/5
1956/1956 [==============================] - 15s 8ms/step - loss: 2.4678e-04 - accuracy: 0.0000e+00
In [12]:
# Create the testing data set

test_data = scaled_data[training_data_len - 60: , :]
print(len(test_data))
# Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
    x_test.append(test_data[i-60:i, 0])
    
# Convert the data to a numpy array
x_test = np.array(x_test)

# Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))

# Get the models predicted price values 
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)

# Get the root mean squared error (RMSE)
rmse = np.sqrt(np.mean(((predictions - y_test) ** 2)))
rmse
564
16/16 [==============================] - 1s 5ms/step
Out[12]:
7.105468837684983
In [13]:
# Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions'] = predictions
# Visualize the data
plt.figure(figsize=(16,6))
plt.title('Model for '  + stock)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions']])
plt.legend(['Train', 'Val', 'Predictions'], loc='lower right')
plt.xticks(rotation=30)
plt.show()
<ipython-input-13-615b4a569934>:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Predictions'] = predictions
In [14]:
# Show the valid and predicted prices
valid.head()
Out[14]:
Close Predictions
Date
2021-04-01 141.520004 138.981064
2021-04-05 143.050003 147.576141
2021-04-06 139.539993 150.892944
2021-04-07 139.139999 147.855850
2021-04-08 139.350006 146.004211
In [15]:
valid.tail()
Out[15]:
Close Predictions
Date
2023-03-27 118.870003 126.113167
2023-03-28 116.400002 124.869614
2023-03-29 119.849998 122.445175
2023-03-30 122.110001 124.865166
2023-03-31 121.970001 127.893478
In [18]:
# Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Prediction'] = predictions
# Visualize the data
plt.figure(figsize=(16,6))
plt.title('Prediction Results for '  + stock)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close & Predicted Price USD ($)', fontsize=18)
plt.plot(valid['Close'])
plt.plot(valid['Prediction'])
plt.legend(['Close','Prediction'], loc='lower right')
plt.show()
<ipython-input-18-7472659437d7>:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Prediction'] = predictions

Use different approach¶

In [20]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
In [27]:
stock = 'AMAT'
dataset = pdr.get_data_yahoo(stock, start=start, end=end)

dataset = dataset.reset_index()
dataset['Date'] = pd.to_datetime(dataset.Date,format='%Y-%m-%d')
dataset.index = dataset['Date']
dataset.head()
[*********************100%***********************]  1 of 1 completed
Out[27]:
Date Open High Low Close Adj Close Volume
Date
2013-04-01 2013-04-01 13.48 13.49 13.27 13.36 11.490364 11026400
2013-04-02 2013-04-02 13.46 13.46 13.19 13.24 11.387156 8535600
2013-04-03 2013-04-03 13.24 13.24 13.02 13.15 11.309752 14428100
2013-04-04 2013-04-04 13.10 13.24 13.03 13.22 11.369952 7399800
2013-04-05 2013-04-05 13.02 13.22 12.91 13.20 11.352752 10349000
In [30]:
dataset = dataset.sort_index(ascending=True, axis=0)
dataset2 = pd.DataFrame(index=range(0,len(dataset)),columns=['Date', 'Close'])

for i in range(0,len(dataset)):
    dataset2['Date'][i] = dataset['Date'][i]
    dataset2['Close'][i] = dataset['Close'][i]
    
dataset2.index = dataset2.Date
dataset2.drop('Date', axis=1, inplace = True)

dataset3 = dataset2.values 

train = dataset3[:training_data_len]
valid = dataset3[training_data_len:]
#print(train)
#print(valid)
In [31]:
# puts everything between (0,1)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset3)
#print(scaled_data)

x_train, y_train = [], []
for i in range(60,len(train)):
    x_train.append(scaled_data[i-60:i,0])
    y_train.append(scaled_data[i,0])
x_train, y_train = np.array(x_train), np.array(y_train)

x_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1],1))
In [34]:
model = Sequential()
model.add(LSTM( units=1000, return_sequences = True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(units=1000))
model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x_train, y_train, epochs=1, batch_size=1, verbose=2)
1956/1956 - 54s - loss: 0.0012 - 54s/epoch - 28ms/step
Out[34]:
<keras.callbacks.History at 0x7f709b289f10>
In [39]:
test_data = scaled_data[training_data_len - 60: , :]
# Create the data sets x_test and y_test
x_test = []
#y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
    x_test.append(test_data[i-60:i, 0])
    
# Convert the data to a numpy array
x_test = np.array(x_test)

# Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))

closing_price = model.predict(x_test)
closing_price = scaler.inverse_transform(closing_price)
#print(X_test.shape)

train = dataset2[:training_data_len]
valid = dataset2[training_data_len:]
valid['Predictions'] = closing_price
16/16 [==============================] - 0s 27ms/step
<ipython-input-39-f8e30692b62c>:20: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Predictions'] = closing_price
In [40]:
plt.figure(figsize=(20,10))
plt.title('Model for '  + stock)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions']])
plt.legend(['Train', 'Val', 'Predictions'], loc='lower right')
plt.xticks(rotation=30)
plt.show()
In [41]:
valid.head()
Out[41]:
Close Predictions
Date
2021-04-01 141.520004 131.586243
2021-04-05 143.050003 138.041382
2021-04-06 139.539993 142.352661
2021-04-07 139.139999 142.241196
2021-04-08 139.350006 141.087372
In [42]:
valid.tail()
Out[42]:
Close Predictions
Date
2023-03-27 118.870003 120.268402
2023-03-28 116.400002 119.482704
2023-03-29 119.849998 117.660210
2023-03-30 122.110001 118.644295
2023-03-31 122.285004 120.666000
In [44]:
# Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Prediction'] = predictions
# Visualize the data
plt.figure(figsize=(16,6))
plt.title('Prediction Results for '  + stock)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close & Predicted Price USD ($)', fontsize=18)
plt.plot(valid['Close'])
plt.plot(valid['Prediction'])
plt.legend(['Close','Prediction'], loc='lower right')
plt.xticks(rotation=30)
plt.show()
<ipython-input-44-f35a25d10c06>:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Prediction'] = predictions